Skeleton-based human action recognition has attracted a lot of researchattention during the past few years. Recent works attempted to utilizerecurrent neural networks to model the temporal dependencies between the 3Dpositional configurations of human body joints for better analysis of humanactivities in the skeletal data. The proposed work extends this idea to spatialdomain as well as temporal domain to better analyze the hidden sources ofaction-related information within the human skeleton sequences in both of thesedomains simultaneously. Based on the pictorial structure of Kinect's skeletaldata, an effective tree-structure based traversal framework is also proposed.In order to deal with the noise in the skeletal data, a new gating mechanismwithin LSTM module is introduced, with which the network can learn thereliability of the sequential data and accordingly adjust the effect of theinput data on the updating procedure of the long-term context representationstored in the unit's memory cell. Moreover, we introduce a novel multi-modalfeature fusion strategy within the LSTM unit in this paper. The comprehensiveexperimental results on seven challenging benchmark datasets for human actionrecognition demonstrate the effectiveness of the proposed method.
展开▼